Search CORE

43 research outputs found

An improved approach for accurate and efficient calling of structural variations with low-coverage sequence data

Author: A Abyzov
A Abyzov
B Langmead
C Alkan
Consortium TGP
H Li
HYK Lam
J Korbel
J Wang
J Zhang
Jiayin Wang
Jin Zhang
K Chen
K Ye
R Mills
RE Handsaker
S Sindi
Yufeng Wu
ZD Zhang
Publication venue: BioMed Central
Publication date: 19/04/2012
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

svclassify: a method to establish benchmark structural variant calls

Author: A Abyzov
A Abyzov
A Kong
AC English
AR Quinlan
B Schölkopf
C Alkan
C Lee
Desu Chen
DMJ Tax
Gabor Bartha
GR Abecasis
H Li
H Li
Hariharan Iyer
Hemang Parikh
Hugo Y. K. Lam
HYK Lam
HYK Lam
JB Burbidge
JH Ward Jr
JM Zook
JT Robinson
Justin M. Zook
K Chen
K Wong
K Ye
M Mohiyuddin
M Yousef
Marc Salit
Marghoob Mohiyuddin
Mark Pratt
MM Deza
N Cristianini
N Spies
Noah Spies
RE Mills
RM Layer
SS Khan
TF Cox
Wolfgang Losert
Publication venue: Springer Nature
Publication date: 16/01/2016
Field of study

The human genome contains variants ranging in size from small single nucleotide polymorphisms (SNPs) to large structural variants (SVs). High-quality benchmark small variant calls for the pilot National Institute of Standards and Technology (NIST) Reference Material (NA12878) have been developed by the Genome in a Bottle Consortium, but no similar high-quality benchmark SV calls exist for this genome. Since SV callers output highly discordant results, we developed methods to combine multiple forms of evidence from multiple sequencing technologies to classify candidate SVs into likely true or false positives. Our method (svclassify) calculates annotations from one or more aligned bam files from many high-throughput sequencing technologies, and then builds a one-class model using these annotations to classify candidate SVs as likely true or false positives. We first used pedigree analysis to develop a set of high-confidence breakpoint-resolved large deletions. We then used svclassify to cluster and classify these deletions as well as a set of high-confidence deletions from the 1000 Genomes Project and a set of breakpoint-resolved complex insertions from Spiral Genetics. We find that likely SVs cluster separately from likely non-SVs based on our annotations, and that the SVs cluster into different types of deletions. We then developed a supervised one-class classification method that uses a training set of random non-SV regions to determine whether candidate SVs have abnormal annotations different from most of the genome. To test this classification method, we use our pedigree-based breakpoint-resolved SVs, SVs validated by the 1000 Genomes Project, and assembly-based breakpoint-resolved insertions, along with semi-automated visualization using svviz. We find that candidate SVs with high scores from multiple technologies have high concordance with PCR validation and an orthogonal consensus method MetaSV (99.7 % concordant), and candidate SVs with low scores are questionable. We distribute a set of 2676 high-confidence deletions and 68 high-confidence insertions with high svclassify scores from these call sets for benchmarking SV callers. We expect these methods to be particularly useful for establishing high-confidence SV calls for benchmark samples that have been characterized by multiple technologies.https://doi.org/10.1186/s12864-016-2366-

Crossref

PubMed Central

Digital Repository at the University of Maryland

svclassify: a method to establish benchmark structural variant calls

Author: A Abyzov
A Abyzov
A Kong
AC English
AR Quinlan
B Schölkopf
C Alkan
C Lee
Desu Chen
DMJ Tax
Gabor Bartha
GR Abecasis
H Li
H Li
Hariharan Iyer
Hemang Parikh
Hugo Y. K. Lam
HYK Lam
HYK Lam
JB Burbidge
JH Ward Jr
JM Zook
JT Robinson
Justin M. Zook
K Chen
K Wong
K Ye
M Mohiyuddin
M Yousef
Marc Salit
Marghoob Mohiyuddin
Mark Pratt
MM Deza
N Cristianini
N Spies
Noah Spies
RE Mills
RM Layer
SS Khan
TF Cox
Wolfgang Losert
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Fast and accurate mutation detection in whole genome sequences of multiple isogenic samples with IsoMut

Author: A Lagerqvist
A McKenna
A. Bodor
D. Ribli
D. Szüts
DC Koboldt
DC Koboldt
EJ Duncavage
F Meacham
G. E. Tusnády
GE Johnson
GG Faust
H Li
H Li
H Li
HYK Lam
I Kinde
I. Csabai
J Molnár
J. Molnár
JT Robinson
K Cibulskis
K Mortelmans
K Nakamura
KB Dahlman
M Forster
M. Krzystanek
MS Lawrence
O. Pipek
P Flicek
PJ Campbell
R Nielsen
ST Sherry
V Grossmann
V Lazar
W Sakai
Z. Szallasi
Á. Póti
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Background: Detection of somatic mutations is one of the main goals of next generation DNA sequencing. A wide range of experimental systems are available for the study of spontaneous or environmentally induced mutagenic processes. However, most of the routinely used mutation calling algorithms are not optimised for the simultaneous analysis of multiple samples, or for non-human experimental model systems with no reliable databases of common genetic variations. Most standard tools either require numerous in-house post filtering steps with scarce documentation or take an unpractically long time to run. To overcome these problems, we designed the streamlined IsoMut tool which can be readily adapted to experimental scenarios where the goal is the identification of experimentally induced mutations in multiple isogenic samples. Methods: Using 30 isogenic samples, reliable cohorts of validated mutations were created for testing purposes. Optimal values of the filtering parameters of IsoMut were determined in a thorough and strict optimization procedure based on these test sets. Results: We show that IsoMut, when tuned correctly, decreases the false positive rate compared to conventional tools in a 30 sample experimental setup; and detects not only single nucleotide variations, but short insertions and deletions as well. IsoMut can also be run more than a hundred times faster than the most precise state of art tool, due its straightforward and easily understandable filtering algorithm. Conclusions: IsoMut has already been successfully applied in multiple recent studies to find unique, treatment induced mutations in sets of isogenic samples with very low false positive rates. These types of studies provide an important contribution to determining the mutagenic effect of environmental agents or genetic defects, and IsoMut turned out to be an invaluable tool in the analysis of such data. © 2017 The Author(s)

Crossref

Springer - Publisher Connector

Harvard University - DASH

PubMed Central

Repository of the Academy's Library

Semmelweis Repository

ELTE Digital Institutional Repository (EDIT)

Online Research Database In Technology

Lessons from the CAGI-4 Hopkins clinical panel challenge

Author: Adhikari A
Buckley BA
Carraro M
Chandonia J-M
Chhibber A
Cutting GR
Fu Y
Gasparini A
Jones DT
Kramer A
Kundu K
Lam HYK
Leonardi E
Moult J
Pal LR
Searls DB
Shah S
Sunyaev S
Tosatto SCE
Yin Y
Publication venue
Publication date: 01/01/2017
Field of study

The CAGI-4 Hopkins clinical panel challenge was an attempt to assess state of the art methods for clinical phenotype prediction from DNA sequence. Participants were provided with exonic sequences of 83 genes for 106 patients from the Johns Hopkins DNA Diagnostic Laboratory. Five groups participated in the challenge, predicting both the probability that each patient had each of fourteen possible classes of disease, as well as one or more causal variants. In cases where the Hopkins laboratory reported a variant, at least one predictor correctly identified the disease class in 36 of 43 patients (84%). Even in cases where the Hopkins laboratory did not find a variant, at least one predictor correctly identified the class in 39 of 63 patients (62%). Each prediction group correctly diagnosed at least one patient that was not successfully diagnosed by any other groups. We discuss the causal variant predictions by the different groups and their implications for further development of methods to assess variants of unknown significance. Our results suggest that clinically relevant variants may be missed when physicians order small panels targeted on a specific phenotype. We also quantify the false positive rate of DNA-guided analysis in the absence of prior phenotypic indication. This article is protected by copyright. All rights reserved

UCL Discovery

eScholarship - University of California

Archivio istituzionale della ricerca - Università di Padova

Brede Tools and Federating Online Neuroinformatics Databases

Author: A Gupta
A Hammers
AF Hamilton
AR Laird
C Svarer
D Ferrucci
DC Essen Van
DW Shattuck
Finn Årup Nielsen
FÅ Nielsen
FÅ Nielsen
FÅ Nielsen
FÅ Nielsen
FÅ Nielsen
FÅ Nielsen
FÅ Nielsen
HYK Lam
J Hartung
JA Turner
JP Shaffer
JW Bohland
KH Cheung
L French
L Marenco
LN Soldatova
M Bota
M Fenner
M Krötzsch
MJ Kempton
MJ Kempton
MJ Kempton
N Ashish
N Tzourio-Mazoyer
P Miller
PE Turkeltaub
PT Fox
PT Fox
R Kötter
R Kötter
RA Poldrack
T Berners-Lee
T Yarkoni
WJ Bug
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Crossref

Online Research Database In Technology

NeuroRDF: semantic integration of highly curated data to prioritize biomarker candidates in Alzheimer's disease

Crossref

Population genetic analysis of bi-allelic structural variants from low-coverage sequence data with an expectation-maximization algorithm

Author: A Abyzov
A Martínez-Fundichely
AR Quinlan
BS Weir
C Stewart
CA Buerkle
Cristina Aguado
CW Whelan
David Vicente-Salvador
E Gazave
E Karakoc
ES Lander
F Hormozdiari
G Bhatia
GR Abecasis
H Li
H Li
H Li
H Shao
HYK Lam
J Berglund
J Wang
JC Venter
JJ Michaelson
JM Kidd
José Ignacio Lucas-Lledó
K Chen
KJ McKernan
M Cáceres
M Muñoz Amatriaín
M Nei
Mario Cáceres
PD Keightley
PH Sudmant
R Li
R Nielsen
R Xi
RB Corbett-Detig
RE Handsaker
RE Mills
S Girirajan
S Levy
SM Ahn
SS Sindi
SY Kim
T Zichner
V Guryev
W Huang
X Li
Y Wang
Z Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Background Population genetics and association studies usually rely on a set of known variable sites that are then genotyped in subsequent samples, because it is easier to genotype than to discover the variation. This is also true for structural variation detected from sequence data. However, the genotypes at known variable sites can only be inferred with uncertainty from low coverage data. Thus, statistical approaches that infer genotype likelihoods, test hypotheses, and estimate population parameters without requiring accurate genotypes are becoming popular. Unfortunately, the current implementations of these methods are intended to analyse only single nucleotide and short indel variation, and they usually assume that the two alleles in a heterozygous individual are sampled with equal probability. This is generally false for structural variants detected with paired ends or split reads. Therefore, the population genetics of structural variants cannot be studied, unless a painstaking and potentially biased genotyping is performed first. Results We present svgem, an expectation-maximization implementation to estimate allele and genotype frequencies, calculate genotype posterior probabilities, and test for Hardy-Weinberg equilibrium and for population differences, from the numbers of times the alleles are observed in each individual. Although applicable to single nucleotide variation, it aims at bi-allelic structural variation of any type, observed by either split reads or paired ends, with arbitrarily high allele sampling bias. We test svgem with simulated and real data from the 1000 Genomes Project. Conclusions svgem makes it possible to use low-coverage sequencing data to study the population distribution of structural variants without having to know their genotypes. Furthermore, this advance allows the combined analysis of structural and nucleotide variation within the same genotype-free statistical framework, thus preventing biases introduced by genotype imputation

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Springer - Publisher Connector

Repositori d'Objectes Digitals per a l'Ensenyament la Recerca i la Cultura

PubMed Central

Diposit Digital de Documents de la UAB

Functional impact and evolution of a novel human polymorphic inversion that disrupts a gene and creates a fusion transcript

Author: A Auton
A Lupo
A Martínez-Fundichely
A Seguin-Orlando
A Untergasser
A Webb
AA Bazzini
AA Hoffmann
AJ Sharp
AWC Pang
BF Voight
BL Browning
C Aguado
C Terao
C Terao
C Trapnell
Carla Giner-Delgado
CB Krimbas
Chikashi Terao
D Lakich
D-J Kleinjan
David Castellano
David Izquierdo
DB Lowry
DE Bauer
DG MacArthur
DM Altshuler
DW Huang
E Hasson
E Tuzun
F Antonacci
F Imsland
F Tajima
FC Jones
Fumihiko Matsuda
GR Abecasis
H Li
H Stefansson
HYK Lam
J Blake
J Ma
J Sambrook
JA Tennessen
JC Barrett
JI Lucas Lledó
JI Lucas-Lledó
JK Pritchard
JM Alves
JM Kidd
JM Kidd
JO Korbel
Joshua M. Akey
José Ignacio Lucas-Lledó
JR González
JW Thomas
K Yoshimura
KG Ardlie
KM Steinberg
L Deng
L Skipper
L-P Wong
LA Lettice
Lorena Pantano
M Joron
M Kirkpatrick
M Kirkpatrick
M Nei
M Puig
M Puig
M Raghavan
M Stoneking
Magdalena Gayà-Vidal
Mario Cáceres
Marta Puig
MC Maher
MC Zody
MI Love
MJ Thompson
ML Bondeson
MPA Salm
P Sulem
PA Umina
R Tanaka
R van der Kant
RD Hernandez
S Anders
S De Jong
S De Rubeis
S Girirajan
S Gravel
S Peischl
S Petrovski
S Schiffels
S Subramanian
T Lappalainen
TR Cech
Tõnu Esko
X Guo
ZG Han
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2015
Field of study

Since the discovery of chromosomal inversions almost 100 years ago, how they are maintained in natural populations has been a highly debated issue. One of the hypotheses is that inversion breakpoints could affect genes and modify gene expression levels, although evidence of this came only from laboratory mutants. In humans, a few inversions have been shown to associate with expression differences, but in all cases the molecular causes have remained elusive. Here, we have carried out a complete characterization of a new human polymorphic inversion and determined that it is specific to East Asian populations. In addition, we demonstrate that it disrupts the ZNF257 gene and, through the translocation of the first exon and regulatory sequences, creates a previously nonexistent fusion transcript, which together are associated to expression changes in several other genes. Finally, we investigate the potential evolutionary and phenotypic consequences of the inversion, and suggest that it is probably deleterious. This is therefore the first example of a natural polymorphic inversion that has position effects and creates a new chimeric gene, contributing to answer an old question in evolutionary biology

Public Library of Science (PLOS)

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Harvard University - DASH

Repositori d'Objectes Digitals per a l'Ensenyament la Recerca i la Cultura

Directory of Open Access Journals

PubMed Central

Kyoto University Research Information Repository

Diposit Digital de Documents de la UAB

FigShare